View Ingested File Content and Artifacts
Learn how to access and review the content of your ingested files, including processed chunks, generated SQL tables, and various artifacts stored in the Binder within the Airia platform. The binder provides enhanced knowledge retrieval – full text, images, additional metadata and pages are now captured per file and can be retrieved with a native Airia tool so that the LLM can expand the variety of questions to answer and can provide more accurate and context-rich outputs.How Airia Processes Your Data
Before you can view ingested content, it’s important to understand Airia’s sophisticated multi-stage ingestion pipeline. This process transforms raw documents into AI-ready knowledge, enabling powerful retrieval-augmented generation (RAG) for your AI agents. The pipeline includes:1. Document Parsing and Chunking
Files are identified, parsed, and broken down into smaller, manageable pieces called chunks.2. Vector Embeddings
Each chunk is transformed into a vector embedding, a numerical representation that captures its semantic meaning, making it efficiently searchable.3. Image Analysis (Optional)
If enabled, Airia detects and analyzes images within documents, generating AI descriptions. This enhances document understanding and makes visual content searchable.4. Text-to-SQL (for CSV and Excel)
For CSV and Excel files, Airia offers Text-to-SQL capability, transforming the file into a searchable SQL table.5. Artifact Generation (The Binder)
During processing, Airia generates various artifacts beyond semantic embeddings, such as the full text of the document, extracted images, and their descriptions. These artifacts are stored in the Binder.What is the Binder?
The Binder is a collection of artifacts generated during the ingestion process. It provides AI agents with a deeper understanding of your files, allowing them to answer complex questions beyond just semantic meaning (e.g., “How many pages?”, “How many images?”, “What is on the image of page 3?”). These artifacts enable dynamic retrieval by the Agent when needed. Binder artifacts can include:fulltext.md: The full text content of the document.pages/: Individual page content (for multi-page documents like PDFs).images/: Extracted images and their AI-generated descriptions.- SQL Tables: For CSV and Excel files processed with Text-to-SQL.
Access Ingested File Content
Follow these steps to view the processed content and artifacts for your ingested files:- Navigate to the Data Sources tab in the Airia platform.
- Locate the desired data source and click on it to view the list of ingested files.
- In the file list, find the specific file you want to examine.
- Click on the file name or select the View content option (if available) to open its detailed view.
- Within the file’s detail view, you will find several tabs:
- Chunks: Displays the list of generated text chunks from the document.
- SQL: (If applicable) Shows the SQL table generated for files processed with Text-to-SQL.
- Binder: Presents a list of all generated artifacts for that specific file.
Supported Binder Artifacts
The availability of specific artifacts in the Binder depends on the file type and the parser used during ingestion.| File Type / Parser | fulltext.md | pages/ | images/ |
|---|---|---|---|
| Basic | ✅ | ✅ | ✅ |
| Advanced | ✅ | ✅ | ✅ |
| Universal | ✅ | ✅ | ❌ |
| Intelligent | ✅ | ✅ | ✅ |
| TXT | ✅ | ❌ | ❌ |
| MS Office (docx, pptx, etc.) | ✅ | ❌ | ✅ |
| Excel | ✅ | ❌ | ❌ |
| Images (png, jpg, tiff, etc.) | |||
| Basic | ✅ | ✅ | ✅ |
| Advanced | ✅ | ✅ | ✅ |
| Universal | ✅ | ✅ | ✅ |
| Intelligent | ✅ | ✅ | ✅ |
| Confluence Page | ✅ | ❌ | ✅ |
| Notion Page | ✅ | ❌ | ✅ |
| Email (Sendgrid, Outlook) | ✅ | ❌ | ❌ |
| YAML | ✅ | ❌ | ❌ |
| XML | ✅ | ❌ | ❌ |
| CSV | ✅ | ❌ | ❌ |
| ServiceNow | ✅ | ❌ | ❌ |
Utilize Binder Knowledge with AI Agents
To enable your AI agents to access and leverage the detailed information within the Binder, you need to configure specific tools in your project.- Navigate to the MCP & Tools tab in your Airia project.
- Click the Add new tool button.
- Search for and select the following tools:
- Binder Content Retrieval
- List artifacts in Binder
- List folders in Binder
💡 Note: When this tool is used to retrieve images, the tool response will be added to the LLM context as content of type ‘image’. This allows multi-modal LLMs to process the image natively and answer questions about visual content.
- Add these tools to your project. No specific configuration is required for the tools themselves.
- Attach these tools to your Language Model (LLM).
